Loading data from Cloudant or CouchDB

You can load data from CouchDB or a managed Cloudant instance using the Cloudant Spark connector.

Prerequisites

Collect your database connection information: the database host, user name, password and source database.

If your Cloudant instance was provisioned in Bluemix you can find the connectivity information in the _Service Credentials_ tab.

Import PixieDust and enable the Apache Spark Job monitor


In [7]:
import pixiedust
pixiedust.enableJobMonitor()


Configure database connectivity

Customize this cell with your Cloudant/CouchDB connection information


In [8]:
# @hidden_cell
# Enter your Cloudant host name
host = '...'
# Enter your Cloudant user name
username = '...'
# Enter your Cloudant password
password = '...'
# Enter your source database name
database = '...'


Load documents from the database

Load the documents into an Apache Spark DataFrame.


In [6]:
# no changes are required to this cell
# obtain Spark SQL Context
sqlContext = SQLContext(sc)
# load data
cloudant_data = sqlContext.read.format("com.cloudant.spark").\
                                      option("cloudant.host", host).\
                                      option("cloudant.username", username).\
                                      option("cloudant.password", password).\
                                      load(database)


Use connectorVersion=2.0.0, dbName=dsx_load_demo, indexName=null, viewName=null,jsonstore.rdd.partitions=10, jsonstore.rdd.maxInPartition=-1,jsonstore.rdd.minInPartition=10, jsonstore.rdd.requestTimeout=900000,bulkSize=200, schemaSampleSize=-1

Explore the loaded data using PixieDust

Select the DataFrame view to inspect the metadata and explore the data by choosing a chart type and chart options.


In [9]:
display(cloudant_data)


Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
For information on how to load data from other sources refer to [these code snippets](https://apsportal.ibm.com/docs/content/analyze-data/python_load.html).

In [ ]: